Combining Machine Learning with Dictionary Lookup for Chemical Compound and Drug Name Recognition Task
نویسندگان
چکیده
Following the interest taken into Name Entity Recognition in academic literature in the Gene Mention recognition task of BioCreative I and II, the BioCreative IV hopes to make the implementation of the system in the field of detecting mentions of chemical compounds and drugs. Considering that the machine learning methods have obtained great success in the correct identification of gene and protein names, and dictionary lookups also have the power to recognize the variable naming convention of chemical and drug names, we combine the above approaches by regarding dictionary results as features to help machine learning. Our system is based on Conditional Random Fields (CRF).
منابع مشابه
Chemical name recognition with harmonized feature-rich conditional random fields
This article presents a machine learning-based solution for automatic chemical and drug name recognition on scientific documents, which was applied in the BioCreative IV CHEMDNER task, namely in the chemical entity mention recognition (CEM) and the chemical document indexing (CDI) sub-tasks. The proposed approach applies conditional random fields with a rich feature set, including linguistic, o...
متن کاملDBCHEM: A Database Query Based Solution for the Chemical Compound and Drug Name Recognition Task
We propose a method, named DBCHEM, based on database queries for the chemical compound and drug name recognition task of the BioCreative IV challenge. We prepared a database with 145 million entries containing compound and drug names, their synonyms, and molecular formulas. PubChem Power User Gateway (PUG) system is used to construct the database. Candidate chemical and drug names are identifie...
متن کاملPatent mining: combining dictionary-based and machine-learning approaches
Exploration of the chemical patent space is essential for early-stage medicinal chemistry activities. The BioCreative CHEMDNER-patents task focuses on the recognition of chemical compounds in patents. This includes recognition of chemical named entities in patents (CEMP), classification of chemical-related patent titles and abstracts (CPD), and recognition of genes and proteins in patent abstra...
متن کاملExploring Extensions to Machine-learning based Gene Normalisation
One of the foundational text-mining tasks in the biomedical domain is the identification of genes and protein names in journal papers. However, the ambiguous nature of gene names means that the performance of information management tasks such as query-based retrieval will suffer if gene name mentions are not explicitly mapped back to a unique identifier in order to resolve issues relating to sy...
متن کاملRecognition of chemical entities: combining dictionary-based and grammar-based approaches
BACKGROUND The past decade has seen an upsurge in the number of publications in chemistry. The ever-swelling volume of available documents makes it increasingly hard to extract relevant new information from such unstructured texts. The BioCreative CHEMDNER challenge invites the development of systems for the automatic recognition of chemicals in text (CEM task) and for ranking the recognized co...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013